Programming with StarPU

نویسنده

Ludovic Courtès

چکیده

Modern platforms used for high-performance computing (HPC) include machines with both generalpurpose CPUs, and “accelerators”, often in the form of graphical processing units (GPUs). StarPU is a C library to exploit such platforms. It provides users with ways to define tasks to be executed on CPUs or GPUs, along with the dependencies among them, and by automatically scheduling them over all the available processing units. In doing so, it also relieves programmers from the need to know the underlying architecture details: it adapts to the available CPUs and GPUs, and automatically transfers data between main memory and GPUs as needed. While StarPU’s approach is successful at addressing run-time scheduling issues, being a C library makes for a poor and error-prone programming interface. This paper presents an effort started in 2011 to promote some of the concepts exported by the library as C language constructs, by means of an extension of the GCC compiler suite. Our main contribution is the design and implementation of language extensions that map to StarPU’s task programming paradigm. We argue that the proposed extensions make it easier to get started with StarPU, eliminate errors that can occur when using the C library, and help diagnose possible mistakes. We conclude on future work. Key-words: parallel programming, GPU, scheduling, programming language support ha l-0 08 07 03 3, v er si on 2 5 Ap r 2 01 3 Extensions du langage C pour la programmation hybride CPU/GPU avec StarPU Résumé : Les plateformes modernes utilisées en calcul intensif (HPC) incluent des machines comprenant à la fois des unités de traitement généralistes (CPU) et des “accélérateurs”, souvent sous la forme d’unités de traitement “graphiques” (GPU). StarPU est une bibliothèque C pour programmer sur ces plateformes. Elle fournit aux utilisateurs des moyens de définir des tâches pouvant s’exécuter aussi bien sur CPU que sur GPU, ainsi que les dépendances entre ces tâches, et s’occupe de les ordonnancer sur toutes les unités de traitement disponibles. Ce faisant, StarPU abstrait le programmeur des détails techniques sous-jacents: StarPU s’adapte aux unités de traitement disponibles et se charge de transférer les données entre elles quand cela est nécessaire. StarPU traite efficacement des problèmes d’ordonnacement, mais l’interface en langage C qu’elle propose est pauvre et facilite les erreurs de programmation. Cet article présente des travaux démarrés en 2011 pour promouvoir certains concepts exposés par la bibliothèque StarPU sous forme d’extensions du langage C, par le biais d’une extensions de la suite de compilateurs GCC. Notre principale contribution est la conception et la mise en œuvre d’extensions du langage C correspondant au paradigme de programmation par tâches de StarPU. Nous montrons que les extensions proposées facilitent la programmation avec StarPU, éliminent des erreurs de programmation pouvant intervenir lorsque la bibliothèque C est utilisée et aident le diagnostique de possibles erreurs. Nous concluons sur les travaux à venir. Mots-clés : programmation parallèle, GPU, ordonnancement, langage de programmation ha l-0 08 07 03 3, v er si on 2 5 Ap r 2 01 3 C Language Extensions for Hybrid CPU/GPU Programming with StarPU 3

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Flexible Runtime Support for Efficient Skeleton Programming on Heterogeneous GPU-based Systems

SkePU is a skeleton programming framework for multicore CPU and multi-GPU systems. StarPU is a runtime system that provides dynamic scheduling and memory management support for heterogeneous, accelerator-based systems. We have implemented support for StarPU as a possible backend for SkePU while keeping the generic SkePU interface intact. The mapping of a SkePU skeleton call to one or more StarP...

متن کامل

C Language Extensions for Hybrid CPU/GPU Programming with StarPU

متن کامل

StarPU-MPI: Task Programming over Clusters of Machines Enhanced with Accelerators

GPUs clusters are becoming widespread HPC platforms. Exploiting them is however challenging, as this requires two separate paradigms (MPI and CUDA or OpenCL) and careful load balancing due to node heterogeneity. Current paradigms usually either limit themselves to offload part of the computation and leave CPUs idle, or require static CPU/GPU work partitioning. We thus have previously proposed S...

متن کامل

Exploiting the Cell/BE Architecture with the StarPU Unified Runtime System

Core specialization is currently one of the most promising ways for designing power-efficient multicore chips. However, approaching the theoretical peak performance of such heterogeneous multicore architectures with specialized accelerators, is a complex issue. While substantial effort has been devoted to efficiently offloading parts of the computation, designing an execution model that unifies...

متن کامل

Faster, Cheaper, Better – a Hybridization Methodology to Develop Linear Algebra Software for GPUs

3 4 for (n = 0 ; n < nt ; n++) // l oop on c o l s 5 for (m = 0 ; m < mt ; m++) // l oop on rows 6 s t a r pu ma t r i x da t a r e g i s t e r (& t i l e h a nd l e [m] [ n ] , 0 , 7 &t i l e [m] [ n ] , M, M, N, s izeof ( f loat ) ) ; Figure 3: Registration of the tiles as handles of matrix data type. Initialization. When initializing StarPU with starpu_init, StarPU automatically detects the ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2013

Programming with StarPU

نویسنده

چکیده

منابع مشابه

Flexible Runtime Support for Efficient Skeleton Programming on Heterogeneous GPU-based Systems

C Language Extensions for Hybrid CPU/GPU Programming with StarPU

StarPU-MPI: Task Programming over Clusters of Machines Enhanced with Accelerators

Exploiting the Cell/BE Architecture with the StarPU Unified Runtime System

Faster, Cheaper, Better – a Hybridization Methodology to Develop Linear Algebra Software for GPUs

عنوان ژورنال:

اشتراک گذاری